Annotating Near-Identity from Coreference Disagreements

نویسندگان

  • Marta Recasens
  • Maria Antònia Martí
  • Constantin Orasan
چکیده

We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al. (2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not possible to annotate near-identity explicitly because subjects are not fully aware of it. Therefore, our second annotation effort used an indirect method, and arrived at near-identity annotations by inference from the disagreements between five annotators who had only a two-alternative choice between coreference and non-coreference. The results show that whereas as little as 2–6% of the relations were explicitly annotated as near-identity in the former effort, up to 12–16% of the relations turned out to be near-identical following the indirect method of the latter effort.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Typology of Near-Identity Relations for Coreference (NIDENT)

The task of coreference resolution requires people or systems to decide when two referring expressions refer to the ‘same’ entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of ‘near-identity’, a middle ground category between identity and ...

متن کامل

Semantic Approach to Identity in Coreference Resolution Task

It has been recently discussed in linguistics that the notion of identity in the task of coreference resolution is of continuous nature, ranging from “complete” identity to non-identity. The current paper confronts this idea with experimental data for Polish, resulting in a new approach to the notion of identity. It extends the definition of coreference with speaker/recipient relation, believed...

متن کامل

How Dependency Trees and Tectogrammatics Help Annotating Coreference and Bridging Relations in Prague Dependency Treebank

In this paper, we explore the benefits of dependency trees and tectogrammatical structure used in the Prague Dependency Treebank for annotating language phenomena that cross the sentence boundary, namely coreference and bridging relations. We present the benefits of dependency trees such as the detailed processing of ellipses, syntactic decisions for coordination and apposition structures that ...

متن کامل

Interesting Linguistic Features in Coreference Annotation of an Inflectional Language

This paper reports on linguistic features and decisions that we find vital in the process of annotation and resolution of coreference for highly inflectional languages. The presented results have been collected during preparation of a corpus of general direct nominal coreference of Polish. Starting from the notion of a mention, its borders and potential vs. actual referentiality, we discuss the...

متن کامل

Polish Coreference Corpus

The Polish Coreference Corpus (PCC) is a large corpus of Polish general nominal coreference built upon the National Corpus of Polish. With its 1900 documents from 14 text genres, containing about 540,000 tokens, 180,000 mentions and 128,000 coreference clusters, the PCC is among the largest coreference corpora in the international community. It has some novel features, such as the annotation of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012